Week 6 - Using AI to Review Your Own Work

What This Page Covers

Every researcher knows the feeling: you submit a paper, wait three to six months for reviews, and then discover that two of the three reviewers flagged problems you could have caught yourself — an unclear argument in Section 3, a missing baseline comparison, a citation that does not actually support the claim you made. These are the kinds of issues that AI can help you identify before submission.

This supplementary page introduces the concept of AI-assisted pre-review: using large language models to simulate aspects of peer review on your own drafts. The goal is not to replace peer review — that remains essential — but to prepare for it. Think of it as a dress rehearsal. You would not give a conference talk without practising it first, so why submit a paper without stress-testing it?

We will cover what AI can usefully check, what it cannot reliably judge, how to structure multi-perspective reviews, practical prompting strategies for getting useful feedback, and the important limitations you need to keep in mind. We will also introduce the /paper-review skill in Claude Code, which implements many of these ideas as an automated multi-agent workflow.

🤔 Why Pre-Review Your Own Work with AI?

The traditional peer review process, for all its importance, has well-known limitations as a feedback mechanism for individual authors. Understanding these limitations helps explain why AI pre-review is a genuinely useful addition to the research workflow.

Peer review is slow. Turnaround times of three to six months are common in most fields. In fast-moving areas like machine learning or genomics, your paper may be partially outdated by the time you receive feedback. AI pre-review gives you structured feedback in minutes, letting you iterate faster before submission.
You typically get very few reviews. Most journals assign two to three reviewers. That is a tiny sample of perspectives on your work. Each reviewer brings their own expertise, blind spots, and priorities. AI can simulate a broader range of review perspectives — not perfectly, but enough to surface issues that a single self-review would miss.
This is not about replacing peer review. Let us be clear about what AI pre-review is and is not. It is a preparation tool, not a substitute. Real peer reviewers bring deep domain knowledge, awareness of the current state of debates in your field, and the ability to judge genuine novelty. AI cannot do these things reliably. What AI can do is catch the surface-level problems — logical inconsistencies, unclear writing, missing methodological details — so that real reviewers can focus their limited attention on substantive issues.
The dress rehearsal analogy. Musicians rehearse before concerts. Lawyers prepare for cross-examination by having colleagues challenge their arguments. Scientists present at lab meetings before conferences. AI pre-review is the same idea applied to written research. You are not asking the AI whether your research is good; you are asking it to find the places where your presentation of the research could be stronger.

⚠️ Important limitations to keep in mind from the start: AI review has serious constraints that you must understand before relying on it. It cannot judge true novelty in your specific sub-field — it does not know what has and has not been done before at the frontier of your research area. It may miss domain-specific conventions that reviewers in your field take for granted. And it tends to be either too harsh or too generous depending on how you prompt it — a prompt that asks for "critical feedback" may produce an unrealistically negative review, while a prompt that asks for "suggestions" may produce empty praise. Calibrating AI review output is a skill you will need to develop.

✅ What AI Can Usefully Check

Not all dimensions of peer review are equally suited to AI assistance. The following six areas correspond to real criteria that reviewers evaluate, ordered roughly from where AI is most helpful to where it is least reliable. Understanding this hierarchy helps you allocate your AI pre-review efforts wisely.

📜 Logical Consistency

Does your abstract promise what your conclusion delivers? Do your methods actually support the claims in your results discussion? Are there internal contradictions between sections?

AI is genuinely good at catching these kinds of internal coherence problems. Because it processes the entire paper at once, it can identify when Section 4 contradicts something stated in Section 2 — something that human authors, who write sections at different times, frequently miss. Ask AI to compare your abstract claims against your actual findings, and to flag any place where your paper promises something it does not deliver.

✍️ Writing Quality

Clarity, structure, flow, redundancy, and organisation. Is the paper well-structured? Are there sections that could be tighter? Are transitions between sections smooth? Is the level of detail appropriate?

This is probably AI's strongest review dimension. Language models are trained on enormous quantities of well-written text, and they are effective at identifying unclear sentences, redundant passages, awkward phrasing, and structural problems. They can suggest where paragraphs should be split, where transitions are needed, and where the argument loses the reader. For non-native English speakers, this dimension alone can significantly improve a paper's chances.

📚 Positioning & Related Work

Has the paper cited the most relevant prior work? Are the novelty claims reasonable given what already exists? Is the paper positioned clearly within the existing literature?

AI with web search capabilities can surface papers you may have missed and identify gaps in your related work section. However, the lesson from Week 5 applies here with full force: any citation AI suggests must be independently verified before inclusion. AI can also help you evaluate whether your novelty claims are appropriately scoped — though it cannot definitively judge whether something is truly novel in your specific niche.

🔬 Experimental Methodology

Are there missing baselines? Are error bars or confidence intervals included? Is the evaluation metric appropriate for the task? Is the experimental setup described with enough detail for replication?

AI can flag obvious methodological gaps — a classification paper without precision/recall, a comparison without standard baselines, missing details about hyperparameters or random seeds. However, it cannot judge whether your specific experimental design is sound for your field. It knows general methodological standards but not the specific conventions of your sub-area. Use it to catch what is missing, not to validate what is present.

📊 Statistical Reporting

Are statistics reported correctly? Are p-values, confidence intervals, and effect sizes presented appropriately? Are sample sizes adequate? Is the statistical test appropriate for the data?

AI can check whether statistical reporting follows standard conventions and whether the numbers in your text match those in your tables. Specialised tools like Statcheck can catch numerical inconsistencies — for instance, a reported p-value that does not match the reported test statistic and degrees of freedom. This is one area where automated checking genuinely catches errors that human reviewers often miss.

📊 Figures and Tables

Do captions accurately describe what is shown? Are axes labelled with units? Is the visual presentation clear and readable? Do tables have appropriate headers and formatting?

AI can flag missing axis labels, captions that do not match the content of the figure, tables without proper headers, and figures that are referenced in the text but not included (or included but never referenced). What it cannot judge is aesthetic effectiveness — whether a figure is the best way to present the data, or whether a different visualisation would communicate the result more clearly. That remains a human judgment.

🚫 What AI Cannot Reliably Judge

Understanding the boundaries of AI review is just as important as understanding its capabilities. The following are areas where AI judgment is less reliable — but this comes with an important caveat: even in these areas, AI may occasionally flag something you missed precisely because you are too close to your own work. Authors have blind spots. You know what you meant to write, which can prevent you from seeing what you actually wrote. An AI reader does not share your assumptions, and that outsider perspective can sometimes surface issues that you and your colleagues have all overlooked. So treat these limitations as reasons for caution, not for dismissal.

                Critical limitations of AI as a reviewer:
                True novelty. AI cannot know what is genuinely new in your specific sub-field. It has broad knowledge from its training data, but it does not have an up-to-the-minute understanding of what has been tried, what has failed, and what the community considers an open problem. A human reviewer in your field can immediately recognise whether your contribution is novel; an AI can only check whether it sounds novel based on what it has seen before. These are very different things.
Domain conventions. Every field has unwritten rules about methodology, presentation, and argument style that AI may not know. Mathematicians expect a certain structure for proofs. Clinical researchers have specific reporting standards (CONSORT, STROBE). Qualitative researchers have their own criteria for rigour. AI may apply generic standards that miss field-specific expectations, or worse, it may confidently suggest changes that violate your field's norms.
Significance. Whether your contribution matters to your community is a human judgment. A technically correct paper can still be uninteresting, incremental, or solving a problem that nobody cares about. AI cannot judge the sociology of your field — which questions are exciting, which results would surprise people, which contributions would change how people think. This is perhaps the most important dimension of peer review, and it is entirely beyond AI's reach.
Ethical concerns. Whether your research raises ethical issues that reviewers would flag — informed consent problems, dual-use concerns, potential for harm to vulnerable populations — requires contextual understanding that AI handles poorly. AI may flag obvious ethical issues (like missing IRB approval) but will miss subtle concerns that arise from understanding the specific populations, contexts, or applications involved.
Reviewer taste and politics. Real reviewers have preferences, agendas, and blind spots that no AI can simulate. Some reviewers favour certain methodological approaches. Some have strong opinions about theoretical frameworks. Some are generous with borderline papers; others are not. The interpersonal and political dimensions of peer review — which are real and consequential — are invisible to AI.

            

🤝 A Multi-Agent Review Approach

One of the most effective ways to get useful AI feedback is to simulate the structure of real peer review, where multiple reviewers with different expertise examine the same paper. Rather than asking a single AI for a generic review, you can create specialist "reviewers" who each focus on a different dimension of quality.

⚙️ Creating Specialist AI Reviewers

One powerful approach is to prompt AI to review your paper from multiple perspectives simultaneously. Rather than asking "review my paper" (which produces generic, surface-level feedback), you can create specialist reviewers, each with a defined focus and expertise:

A Theory reviewer who checks logical rigour, proof correctness, and whether the theoretical contributions are sound and well-motivated. This reviewer looks for gaps in arguments, unstated assumptions, and logical leaps.
An Experiments reviewer who evaluates methodology, baselines, statistical reporting, and reproducibility. This reviewer checks whether the experimental design supports the claims, whether comparisons are fair, and whether there is enough detail to replicate the work.
A Writing reviewer who focuses on clarity, structure, flow, and presentation. This reviewer identifies unclear passages, redundant sections, missing transitions, and places where the paper loses the reader.
A Positioning reviewer who checks related work coverage, novelty claims, and how the paper situates itself in the existing literature. This reviewer looks for missing citations, overclaimed novelty, and inadequate discussion of competing approaches.
A Devil's Advocate whose sole job is to construct the strongest possible objection to your main claim. This is often the most valuable reviewer, because it forces you to confront weaknesses you cannot see due to your own belief in your thesis.

Running these specialist reviews in parallel and then synthesising the results gives you a much more thorough and actionable review than a single generic prompt. Each reviewer catches things the others miss, just as real multi-reviewer peer review works better than a single opinion.

💡 The /paper-review skill in Claude Code: This multi-agent approach is available as a built-in skill. When you run /paper-review, it spawns specialist agents in parallel — each with a distinct review focus — synthesises their findings, ranks issues by severity, and produces a mock review formatted in the style of your target venue (NeurIPS, Nature, Science, and many other major venues are supported with venue-specific rubrics). Critically, the human always gates implementation: the skill identifies problems and suggests fixes, but no changes are made to your paper without your explicit approval. You remain in control of every edit.

🛠️ Practical Guide — How to Get Good Feedback

The quality of AI review feedback depends heavily on how you ask for it. Vague requests produce vague responses. Specific, well-structured prompts produce actionable criticism. The following steps represent best practice for getting AI feedback that actually improves your paper.

Provide the full paper, not just an abstract.
AI needs context to give useful feedback. An abstract alone does not contain enough information for meaningful review. If your paper is too long for the context window, provide the full text of the sections you want reviewed along with the abstract and conclusion for context. A reviewer who has only read your abstract cannot give you useful feedback on your methodology — and neither can an AI.
Specify your target venue.
Telling the AI "this paper is being submitted to [journal or conference name]" helps it calibrate expectations. A paper for Nature needs to emphasise broad significance and accessibility. A paper for a specialist journal needs depth and technical precision. A NeurIPS submission needs strong experimental methodology and clear novelty claims. The venue context shapes what counts as a weakness.
Ask for specific, located criticism.
"Identify the 5 most significant weaknesses in this paper, with exact section numbers and specific suggestions for improvement" is dramatically better than "any feedback?" The former forces the AI to prioritise and be concrete. The latter invites a list of generic platitudes. Always ask for section-level or paragraph-level specificity, and always ask for a ranked list so you know what to fix first.
Request the Devil's Advocate explicitly.
"Construct the strongest possible argument against the main claim of this paper" is one of the most valuable prompts you can use. It forces the AI to adopt an adversarial stance rather than a supportive one. The issues it surfaces are often the ones you cannot see yourself, because you believe in your own thesis. This is not comfortable, but it is useful. The best papers are the ones that have anticipated and addressed the strongest objections.
Iterate: fix, then review again.
Real peer review is iterative — you receive feedback, revise, and often go through multiple rounds. Your AI pre-review should work the same way. Fix the issues from the first round, then run the review again. New issues will surface because changes to one section can create problems in another. Two or three rounds of AI review, with revisions in between, are far more valuable than a single pass.

⚠️ The Limitations — An Honest Warning

We have discussed what AI review can and cannot do, but there are deeper risks that deserve explicit attention. AI pre-review, used carelessly, can actually make your paper worse by creating false confidence or by optimising for the wrong things.

⚠️ Risks of AI pre-review that you must manage:

False confidence from surface-level feedback. If your AI pre-review only finds minor issues — typos, unclear sentences, a missing comma — you may conclude that the paper is in good shape. But the AI may have missed the biggest weakness because that weakness is specific to your field and requires domain expertise to recognise. A clean AI review does not mean a clean paper.
Generic praise that feels good but changes nothing. If you prompt AI in a way that invites positive feedback ("What are the strengths of this paper?"), it will happily produce paragraphs of praise. This feels validating but does not improve your work. Always bias your prompts toward criticism. You want the AI to find problems, not to make you feel good.
Optimising for AI-reviewable dimensions at the expense of others. If you polish writing quality and logical consistency (dimensions AI is good at checking) but neglect novelty, significance, and domain positioning (dimensions AI cannot check), you may end up with a beautifully written paper that is rejected for being insufficiently novel or significant.
Over-reliance on a single perspective. Even the multi-agent approach described above is still fundamentally one system with one training distribution. It cannot replace the genuine diversity of perspectives that comes from having your paper read by colleagues, mentors, and domain experts. AI pre-review should supplement, not replace, human feedback from people who know your field.

📖 Scenario: The False Confidence Trap

Consider this scenario: A student runs their thesis chapter through AI review. The AI is thorough — it flags 15 issues across the chapter. There are two typos in Section 2, three unclear sentences in Section 3, a paragraph in Section 4 that could be tightened, a missing transition between Sections 5 and 6, and several minor formatting inconsistencies. The student diligently fixes all 15 issues and feels confident that the chapter is now polished and ready.

But when the chapter goes to the thesis supervisor, the feedback is very different. The supervisor identifies a fundamental methodological flaw: the statistical test used in the analysis is inappropriate for the data distribution, and the conclusions drawn from it do not follow. This is the kind of domain-specific judgment that the AI was never equipped to make. It could check whether the statistics were reported correctly, but it could not judge whether the choice of statistical test was appropriate for this specific research context.

The lesson: AI review is a complement to domain expertise, not a replacement. Use it to handle the surface issues so that your supervisor, committee, and peer reviewers can focus their expertise on the substantive questions that only humans can judge. That division of labour is what makes AI pre-review genuinely valuable.

🎓 Venue-Specific Review Standards

One of the most important things to understand about peer review is that standards vary enormously between venues. A paper that is excellent by the standards of one journal may be entirely wrong for another. Your AI pre-review needs to account for this.

💡 Different venues, different expectations:

Machine learning conferences (NeurIPS, ICML, ICLR): These venues prioritise technical correctness, clear novelty over prior work, strong experimental methodology with proper baselines, ablation studies, and reproducibility. Papers are expected to include code availability statements and detailed experimental setups. The review process is typically double-blind with a rebuttal phase.
General science journals (Nature, Science, PNAS): These venues prioritise significance, broad interest beyond a single sub-field, methodological rigour, and clear communication to a wide scientific audience. A technically impressive result that matters only to specialists may not be suitable. The bar for novelty and impact is extremely high.
Domain-specific journals: Every field has its own conventions for methodology, reporting, argument structure, and presentation. Clinical journals require CONSORT or STROBE checklists. Social science journals expect specific approaches to validity and reflexivity. Mathematics journals have their own norms for proof presentation. AI cannot know all of these conventions, but it can be guided by telling it which venue you are targeting.
The /paper-review skill supports venue-specific rubrics for many major venues. When you specify a target venue, the review is calibrated to that venue's known criteria, review form structure, and common reasons for rejection. This is not a perfect simulation of the venue's review process, but it is substantially more useful than a generic review that ignores venue context entirely.

Key Takeaways

AI pre-review catches surface issues before peer reviewers see them. Logical inconsistencies, unclear writing, missing methodological details, and reporting errors are all dimensions where AI feedback is genuinely useful. Fixing these before submission means that real reviewers can focus their expertise on the substantive questions that matter most.
Multiple specialist "reviewers" give better feedback than one generic review. Structuring your AI review as a panel — with dedicated theory, experiments, writing, positioning, and devil's advocate perspectives — produces more thorough and actionable feedback than a single prompt asking for general comments.
AI cannot judge true novelty, domain conventions, or significance. These are the dimensions that matter most in peer review, and they remain firmly in the domain of human expertise. No amount of AI pre-review can tell you whether your contribution is genuinely important to your field.
Use AI review to prepare for peer review, not to replace it. The goal is preparation, not substitution. AI pre-review and human peer review serve different functions, and both are necessary. Think of AI as your first reader, not your final judge.
The /paper-review skill in Claude Code implements this multi-agent approach. It provides venue-specific, multi-perspective review with severity rankings and human-gated implementation. If you use Claude Code for your research workflow, this skill is a practical way to apply the concepts from this page.

                    👉 Back to Week 6: Return to the main Week 6 lessons for the core content on AI for Writing, Communication and Research Ideation. This supplementary page is designed to complement those lessons by providing a specific, actionable workflow for one of the most valuable uses of AI in the research process: making your papers stronger before anyone else reads them.